Goto

Collaborating Authors

 strong evidence


Early detection of disease outbreaks and non-outbreaks using incidence data

Gao, Shan, Chakraborty, Amit K., Greiner, Russell, Lewis, Mark A., Wang, Hao

arXiv.org Artificial Intelligence

Forecasting the occurrence and absence of novel disease outbreaks is essential for disease management. Here, we develop a general model, with no real-world training data, that accurately forecasts outbreaks and non-outbreaks. We propose a novel framework, using a feature-based time series classification method to forecast outbreaks and non-outbreaks. We tested our methods on synthetic data from a Susceptible-Infected-Recovered model for slowly changing, noisy disease dynamics. Outbreak sequences give a transcritical bifurcation within a specified future time window, whereas non-outbreak (null bifurcation) sequences do not. We identified incipient differences in time series of infectives leading to future outbreaks and non-outbreaks. These differences are reflected in 22 statistical features and 5 early warning signal indicators. Classifier performance, given by the area under the receiver-operating curve, ranged from 0.99 for large expanding windows of training data to 0.7 for small rolling windows. Real-world performances of classifiers were tested on two empirical datasets, COVID-19 data from Singapore and SARS data from Hong Kong, with two classifiers exhibiting high accuracy. In summary, we showed that there are statistical features that distinguish outbreak and non-outbreak sequences long before outbreaks occur. We could detect these differences in synthetic and real-world data sets, well before potential outbreaks occur.


Balancing central and marginal rejection when combining independent significance tests

Salahub, Chris, Oldford, Wayne

arXiv.org Artificial Intelligence

A common approach to evaluating the significance of a collection of $p$-values combines them with a pooling function, in particular when the original data are not available. These pooled $p$-values convert a sample of $p$-values into a single number which behaves like a univariate $p$-value. To clarify discussion of these functions, a telescoping series of alternative hypotheses are introduced that communicate the strength and prevalence of non-null evidence in the $p$-values before general pooling formulae are discussed. A pattern noticed in the UMP pooled $p$-value for a particular alternative motivates the definition and discussion of central and marginal rejection levels at $\alpha$. It is proven that central rejection is always greater than or equal to marginal rejection, motivating a quotient to measure the balance between the two for pooled $p$-values. A combining function based on the $\chi^2_{\kappa}$ quantile transformation is proposed to control this quotient and shown to be robust to mis-specified parameters relative to the UMP. Different powers for different parameter settings motivate a map of plausible alternatives based on where this pooled $p$-value is minimized.


r/MachineLearning - [D] OpenAI releases GPT-2 1.5B model despite "extremist groups can use GPT-2 for misuse" but "no strong evidence of misuse so far".

#artificialintelligence

We've seen no strong evidence of misuse so far They are going against their own word, but nevertheless, it's nice to see that they are releasing everything. EDIT: The unicorn example added below from https://talktotransformer.com/, which has already been updated with the newest 1.5B parameters model. Input: In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. Output: While there are only a few documented instances of unicorns in the wild, the researchers said the finding proves that there are still large numbers of wild unicorns that remain to be studied.


Network Classification and Categorization

Canning, James P., Ingram, Emma E., Nowak-Wolff, Sammantha, Ortiz, Adriana M., Ahmed, Nesreen K., Rossi, Ryan A., Schmitt, Karl R. B., Soundarajan, Sucheta

arXiv.org Machine Learning

To the best of our knowledge, this paper presents the first large-scale study that tests whether network categories (e.g., social networks vs. web graphs) are distinguishable from one another (using both categories of real-world networks and synthetic graphs). A classification accuracy of $94.2\%$ was achieved using a random forest classifier with both real and synthetic networks. This work makes two important findings. First, real-world networks from various domains have distinct structural properties that allow us to predict with high accuracy the category of an arbitrary network. Second, classifying synthetic networks is trivial as our models can easily distinguish between synthetic graphs and the real-world networks they are supposed to model.